Usefulness of approximate entropy in the diagnosis of schizophrenia.

Objectives: Diagnosis of the psychiatric diseases is a bit challenging at the first interview due to this fact that qualitative criteria are not as accurate as quantitative ones. Here, the objective is to classify schizophrenic patients from the healthy subject using a quantitative index elicited from their electroencephalogram (EEG) signals. Methods: Ten right handed male patients with schizophrenia who had just auditory hallucination and did not have any other psychotic features and ten age-matched right handed normal male control participants participated in this study. The patients used haloperidol to minimize the drug-related affection on their EEG signals. Electrophysiological data were recorded using a Neuroscan 24 Channel Synamps system, with a signal gain equal to 75K (150 xs at the headbox). According to the observable anatomical differences in the brain of schizophrenic patients from controls, several discriminative features including AR coefficients, band power, fractal dimension, and approximation entropy (ApEn) were chosen to extract quantitative values from the EEG signals. Results: The extracted features were applied to support vector machine (SVM) classifier that produced 88.40% accuracy for distinguishing the two groups. Incidentally, ApEn produces more discriminative information compare to the other features. Conclusion: This research presents a reliable quantitative approach to distinguish the control subjects from the schizophrenic patients. Moreover, other representative features are implemented but ApEn produces higher performance due to complex and irregular nature of EEG signals.


Introduction
chizophrenia is a severe and persistent debilitating psychiatric disorder.
Diagnosis of schizophrenic patients is mostly performed based on qualitative criteria. According to the diagnostic criteria of the American Psychiatric Association (DSM-IV) (1), patients show disturbances in thoughts (or cognitions), affects, and perceptions and difficulties in relationships with others. In schizophrenia, a major enduring split exists between affect and thoughts. The hallmark symptoms of schizophrenia are the experiences of hallucinations, often of the auditory type, as well as delusions.
Electroencephalogram (EEG) has been an important clinical tool for the evaluation and diagnosis of brain diseases. First attempts to apply methods from nonlinear time series analysis to EEG were carried out in the framework of the chaos hypothesis. It was assumed that the EEG within a particular psycho-physiological state could be described by a deterministic chaotic system and therefore could be characterized by invariant measures such as the fractal dimension or as Lyapunov exponents. Recently, much attention is given to analysis of EEG signals of schizophrenic patients (2) Lee et al. 3 detected the non-linearity in the schizophrenia with a modified method of surrogate data. They showed the correlation dimension could be used as a discriminating statistic to demonstrate non-linearity in the EEG. Jeong et al. (4) stated that the value of D 2 Approximate entropy (ApEn) is another parameter recently introduced to quantify regularity in data without any prior knowledge about the system generating them. It was constructed by Pincus, motivated by applications to short and noisy data sets, along thematically similar lines to K-S entropy in the left inferior frontal and anterior temporal regions in 13 schizophrenic patients is decreased compared to eight healthy controls. Kim et al (5) reported decreasing of first-Lyapunov exponent in the frontal regions of 25 schizophrenic patients in comparison with 15 healthy controls.
The disturbances of the normal sleep EEG architecture associated with schizophrenia were also investigated from a nonlinear perspective. Kirsch et al. (6) reported that during the performance of a cognitive task, the D2 of healthy patients' EEG decreased. This change did not occur in patients with schizophrenia performing the same task. Sabeti et al. (7) selected the best frequency bands by genetic algorithm to classify the schizophrenic and control participants. 7. However, the focus was, in this case, to provide a widely applicable, statistically valid formula that will distinguish data sets by a measure of regularity. The observation motivating ApEn is that if joint probability measures of reconstructed dynamics that describe each of two systems are different, then their marginal probability distributions on a fixed partition, given by conditional probability, are likely different. Typically, orders of magnitude fewer points are needed to accurately estimate these marginal probabilities than to accurately reconstruct the attractor measure defining the process.
Based on numerous studies, ApEn may correlate with hidden changes often undetected by other more classical time series analyses including spectral analysis and correlation dimension. ApEn changes have often been seen to be predictive of subsequent clinical changes. This has facilitated the application of ApEn to numerous settings both within and outside of biology. Preliminary evidence suggests that ApEn of EEG is predictive of epileptic seizures 9 . It is also applied to extract features from EEG and respiratory recordings of a patient during Cheyne-Stokes respiration (10) and to quantify the depth of anesthesia 14. The objective of this study is to evaluate the estimated ApEn of schizophrenic patients' EEGs compared to healthy subjects. The reminder of this paper is organized as follow: Section 2 explains the experimental setup, the task and the basic data preprocessing. Section 3 introduces the employed features and section 4 briefly describes the SVM (12) classifier. Experimental results are given in section 5. Finally a discussion and conclusion part is presented.

Data acquisition
Ten patients with schizophrenia and ten age-matched control participants (all male, uniformly distributed in the interval of 18-55 years old) participated in this study. They were recruited from the Center for Clinical Research in Neuropsychiatry, Perth, Western Australia. According to DSM-IV criteria (1), the patients were diagnosed as having a lifetime schizophrenia or schizophrenia spectrum disorder. The patients were not divided in some sub-groups regarding subtype of schizophrenia. The patients used haloperidol to minimize the drug-related affection on their EEG signals. It should be noted that the history reports of both groups confirmed that normal participants did not have any psychotic symptom and also our patients had just auditory hallucination and did not have any other psychotic features.
The signals were recorded when the patients were in the remission phase, otherwise the signal recording could not be performed. Each participant was seated upright with eyes open and the experiment lasted for two minutes. Electrophysiological data were recorded using a Neuroscan 24 Channel Synamps system, with a signal gain equal to 75K (150 xs at the headbox). For EEG paradigms, 20 electrodes (Electrocap 10-20 standard system (13) were recorded plus left and right mastoids, VEOG (14) and HEOG (14). In the EEG paradigms, eye-blink artifacts were corrected using the technique proposed in 15, and manually screened for artifact. EEG data were recorded from 20 electrodes (Fpz, Fz, Cz, Pz, sampling frequency rate at 200 Hz. Figure 1 shows the head partition and electrodes positioning.

Approximate Entropy
ApEn was introduced as a quantification of regularity in sequences and time series data, initially motivated by applications to relatively short, noisy data sets. Mathematically it is part of a general development of approximating Markov Chains to a process. It provides a finite sequence formulation of randomness, via proximity to maximal irregularity. A statistical evaluation of ApEn is available in. ApEn is a scale invariant feature which reveals both dominant and subordinant (17) information within a time frame. Therefore, ApEn is repeatedly considered as an informative feature that is led to highly discriminate EEG signals of similar diseases (17). Notably it detects changes in underlying episodic behavior not reflected in peak occurrences or amplitudes. It is applicable to systems with least 50 data points and to broad classes of models; it can be applied to discriminate both general classes of correlated stochastic processes, as well as noisy deterministic systems.
Moreover, ApEn is complementary to spectral and autocorrelation analyzes, providing effective discriminatory capability instances in which the aforementioned measures exhibit minimal distinctions. It is nearly unaffected by low level noise, is also robust to meaningful information with a reasonable number of data points, and is finite for both stochastic and deterministic processes. It measures the logarithmic likelihood that runs of patterns that are close remain close on subsequent incremental comparisons, and assigns a nonnegative number to a time series, with larger values corresponding more complexity or irregularity in the data. ApEn has two userspecified parameters: a run length m and a tolerance window r. It is important to consider ApEn (m, r) or ApEn (m, r, N), where N is the number of points of the time.
Formally, given N data points from a time series , to compute ApEn, one should follow these steps.
• Form m -vectors , as the maximum absolute difference between their respective scalar components, i.e., the maximum norm • For a given ) (i X , count the number of ) , 1 ,..., ) (i C m r measures, within a tolerance r , the frequency of patterns similar to a given one of window length m .
• Compute the natural logarithm of each ) (i C m r , and average it over i • Increase the dimension to 1 + m . Repeat steps 1-4 and find ) For the study discussed in this paper, ApEn is estimated using the widely established parameter values of 2 = m , and 1 . 0 = r times the standard deviation (SD) of the original data sequence.

Auto-regressive (AR) coefficients
AR model is a powerful tool for signal modeling. In this model, each sample is considered as a prediction of previous weighted samples. The number of weights determines the model order. Here, autoregressive coefficients are estimated by Burg method (19). The Burg method fits an AR model (order P), which is shown in the equation (6), to the input signal x. The process of signal modeling is performed by minimizing the forward and backward prediction errors while constraining the AR coefficients, i a , to satisfy the Levinson-Durbin recursion.
Band Power EEG contains different specific frequency components which some of them carry the discriminative information. This feature reflects the energy of alpha, beta, theta and delta bands which are particularly important to classify the different brain states. At first, EEG signals have been filtered by four Butterworth band pass filters (order five) in 8-13 Hz (alpha band), 13-30 Hz (beta band), 4-8 Hz (theta band) and 0-4 Hz (delta band). Then, the filtered signals are squared to determine the signal power in each windowed signal.

Fractal dimension
Fractal dimension (20) has a direct relation with the amount of information inside a signal, and can be interpreted as the degree meandering (or roughness or irregularity) of a signal. Consider x (1), x (2), x (N) the time sequence to be analyzed. Construct k time series k m x as follow: where m=1, 2, …, k, m shows the initial time and k shows delay between points.
where N is the length of time sequence. Total average length ) (k L is computed for all time series having the same delay k but different m as: This procedure is repeated for each k ranging from 1 to max k , the total average length for delay k, L (k), is proportional to D k − , where D is the fractal dimension by Higuchi's method. In the curve of ln(L(k)) versus ln(1/k), the slope of the least-squares linear best fit, is the estimate of the fractal dimension.

Classifier
The main idea of SVM (21) is to construct a hyper-plane as a decision surface in such a way that the margin of separation between positive and negative examples is maximized. The support vector machine is an approximate implementation of the method of structural risk minimization. The SVM, given labeled training data constructs a maximal margin linear classifier in a high dimensional feature space ) (x φ defined by a positive definite kernel function ) , ( x x k ′ specifying an inner product in the feature space, A common kernel is the Gaussian radial basis function (RBF), The function implemented by a support vector machine is given by To find the optimal coefficients α of this expansion, it is sufficient to maximize the function,

Results
In order to study the difference between ApEn of healthy and schizophrenic participants, the ApEn is extracted from successive windowed signals that each takes 2 seconds and successive frames have 50% overlap. Resulting time series was constructed from ApEn values calculated within windows sliding in one steps. A trial of our EEG dataset along with its ApEn index is shown in Figure 2.  For the other features, the EEG signal is divided into the same window size to fairly validate the assumption of stationary. For each windowed signal, we have extracted AR coefficients, band power and Higuchi fractal dimension. After extraction of the features, the estimated ApEn, AR coefficients, band power and Higuchi fractal dimension for five channels are used as inputs to SVM classifier. The important point in the validation phase is that to avoid having a correlation between the train and test feature vectors, each time, feature vectors of each participant is considered as test and the rest is considered as train set. Here, we call it leave-one (participant)-out cross validation method.
Tables 1-5 show mean ± standard deviation of ApEn for Cz, C3, C4, T3 and T4. These channels are studied because they are located in the temporal lobes located over the limbic area. The neuro-psychological findings state the difference between the EEG indexes of schizophrenic and normal participants is more highlighted in this area (20). The classification accuracy using leaveone (participant)-out cross validation by considering the features of the mentioned channels is shown in table 6. To demonstrate statistical significance of the achieved results, F-test and pair T-test were applied on the classification results. All calculated F-test values were higher than 1 and the P-values determined less than 0.05 that confirms the significant supremacy of ApEn compare to the other features.
In order to analyze whether the performance of each feature is biased to one of the two groups or not, sensitivity (true positive ratio) and specificity (true negative ratio) of the results are calculated by the following statistical indexes: In figure 3 the classification accuracy of the employed features are depicted. It is shown the ApEn is more informative than the other features for classifying the two groups. Table 6. Classification accuracy using leave-one (participant)-out cross validation method.

Discussion
As far as schizophrenic patients thoughts are not complex and in the arbitrary tasks, these patients tend to be repetitive rather using a vast variety of choices (22), it is expected to achieve lower complexity value in their EEG signals. Hence, the signals entropy (which is related with the amount of chaotic behavior of a signal) is employed here to represent the complexity values in the two mentioned groups. In this study, a fast method denotes as ApEn is employed to extract the entropy of EEG signals and also SVM classifier is applied to the extracted features for distinguishing the two groups.
The extracted complexity values for normal subjects were remarkably higher than that of schizophrenic patients. These changes are significantly highlighted in those channels located over the limbic area of the brain. The anatomical and functional changes in the limbic systems of schizophrenic patients compare to that of healthy subjects have been observed in fMRI and PET images that are vastly reported in the literatures (23,24) Hence, only the recorded EEGs from the Cz, C3, C4, T3 and T4 channels were analyzed to avoid the redundancy.
In similar studies, the compared features such as band power (25), fractal dimension (26,27), and AR coefficients (28) were considered as discriminative features to classify psychotic patients from controls. Most of these studies use auditory stimulus to find a difference in response of their (evoke potential) to this external inputs. Although some of these attempts lead to exhibit significant results, none of them apply their methods to raw EEG signals. This reason is that analyzing the raw EEG is much harder rather focusing on just differences in auditory evoke potential (AEP).
For example, band power feature is very discriminative when an imagery movement (similar to the brain computer interface application) is requested from the subject in the recording protocol; otherwise, no physiological fact exists to change the discharge rate of neurons in different brain lobe at the restful condition.
AR coefficients try to model the time or spectral behavior of a signal trial. Although EEG signals behave noisy and it is assumed spectrum of such signal should significantly varies, as far as the brain state does not change, the frequency content of this noisy signal does not remarkably varying. Therefore, we do not expect to see a dramatic change in the AR coefficients between the normal and schizophrenic subjects.
Due to the irregular behavior of EEG signals, fractal dimension and entropy (complexity) based features seem being informative. If our application was an offline process and we accessed to large number of samples, the results of fractal dimension and ApEn would be fairly similar. Complexity and entropy based features are closely related to each other such that entropy of a signal is related with the complexity and fractal dimension of that signal (29,30,31) Moreover, ApEn estimated entropy of a signal much faster than the stateof-art methods of computing the fractal dimension such as correlation dimension, Higuchi, Hurst exponents or dominant Lyapunov exponent methods.
As it can be seen in table 6, ApEn provided a more precise result because the window length is limited and ApEn does not need large number of samples to produce a reliable index, while performance of fractal dimension is highly dependent of the length of the signal (number of samples). In addition, ApEn index for a short length signal is very fast to compute and is efficient for online decision making process.
Leave-one (participant)-out cross validation method is applied to our experimental data to minimize the over-fitting affect by removing the correlation between train and test sets. Finally the results with the ApEn show 88.40% accuracy between the two groups that significantly outperformed the rival features (Fig. 3).
Another advantage of the proposed approach is that without using the beamforming or localization methods, we can find out the key areas in which maximum changes is occurred between the two groups. In other words, if we consider the features of all 20 channels, not only no improvement would be achieved but also the classification rate would be decreased due to increasing the redundant features leading to incline the complexity while feed no more information to the features of the mentioned channels.
SVM is a power classifier which simultaneously minimizes the structural risk while maximizing the classification accuracy. Unlike other classifiers, SVM considers a controllable confidence margin around its boarder which lead to both the minimizing the over-fitting with achieving acceptable results in the situation that small sample problem imposes to our experiment. As we can see in a similar research performed by Sabeti et al. (32), who employed LDA and Adaboost classifiers to assess a bigger population of controls and patients, while SVM enable us to take the same results with much less samples. This similarity to take similar results with different population indicates the capability of SVM in handing small data. In contrast, if we train LDA or Adaboost classifiers with much lower training samples (number of patients and controls); the performance of both classifiers would be remarkably declined because they do not consider any margin while the features are learnt.
In conclusion, ApEn is introduced as a powerful feature which is computed fast and acts precisely to extract informative information to classify psychotic disease from the controls.